Using Wikipedia to Validate the Terminology found in a Corpus of Basic Textbooks
نویسندگان
چکیده
A scientific vocabulary is a set of terms that designate scientific concepts. This set of lexical units can be used in several applications ranging from the development of terminological dictionaries and machine translation systems to the development of lexical databases and beyond. Even though automatic term recognition systems exist since the 80s, this process is still mainly done by hand, since it generally yields more accurate results, although not in less time and at a higher cost. Some of the reasons for this are the fairly low precision and recall results obtained, the domain dependence of existing tools and the lack of available semantic knowledge needed to validate these results. In this paper we present a method that uses Wikipedia as a semantic knowledge resource, to validate term candidates from a set of scientific text books used in the last three years of high school for mathematics, health education and ecology. The proposed method may be applied to any domain or language (assuming there is a minimal coverage by Wikipedia).
منابع مشابه
How textbooks (and learners) get it wrong: A corpus study of modal auxiliary verbs
Many elements contribute to the relative difficulty in acquiring specific aspects of English as a foreign language (Goldschneider & DeKeyser, 2001). Modal auxiliary verbs (e.g. could, might), are examples of a structure that is difficult for many learners. Not only are they particularly complex semantically, but especially in the Malaysian context ...
متن کاملVisual Representation of Social Actors in ELT Nursery Rhymes
With the advent of globalization, especially in its third phase (see Robertson, 2003), global relations of domination have undermined abuse of power at national and local levels (Fairclough, 2001). Global ELT textbooks, as corollaries of the globalization process, are not immune to the embedment of discriminatory discourses, as various studies have shown (see for example, Gray, 2010, 2012; Baba...
متن کاملA Conversation Analysis of Ellipsis and Substitution in Global Business English Textbooks
Despite the body of research on textbook evaluation from the discourse analysis perspective, cohesive devices have rarely been analyzed in English for Specific Purposes (ESP) textbooks. The acquisition and use of cohesive devices is inherent to naturalistic communication, including business interactions. Hence, L2 learners of business English should be exposed to these devices through cohesion-...
متن کاملTraduction automatique statistique à partir de corpus comparables : application aux couples de langues arabe-français
The present research aims to exploit comparable corpora for Statistical Machine Translation (SMT). First, a hybrid approach based on statistical and linguistics-based information is proposed for bilingual terminology extraction from Wikipedia documents. Then, we propose a hybrid approach based on length and dictionary model for the alignment of the United Nations (UN) corpus at the sentence lev...
متن کاملReview of skin cancers terminology, etiology and treatment from ancient Persian medicine view point
Background: Skin cancers are the most prevalent type among the white with an increasing trend of incidence around the world and Iran. Scientific developments in diagnosing these cancers and using screening methods and utilizing treatment methods have contributed to the relative control of the cancer. Hence, it is necessary to consider other suggested approaches of complementary and traditional ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012